--- title: "Statistical data visualizations (interactive)" subtitle: "BIOF 440" author: "Abhijit Dasgupta" ---

Why dynamic/interactive data visualizations

Why dynamic/interactive data visualizations

Why dynamic/interactive data visualizations

.footnote[Stephen Few, Data Visualization: Past, Present and Future, 2007]

Visual analytics

Visual analytics

.footnote[Tableau, Why Visual Analytics?]

Visual analytics

We will still use principles of good visual encodings

We can add

Visual analytics

As a reminder, we'll still use the usual geometries to encode data

Visual analytics

Kinds of interactions

  1. Scroll and pan
  2. Zoom
  3. Open and close
  4. Sort and re-arrange
  5. Search and filter

.footnote[Jennifer Tidwell]

Visual analytics

Ben Schneiderman proposed principles for interactive/dynamic graphics

  1. Overview first
  2. Zoom and filter
  3. Details on demand

--

This can be translated as

  1. Give an overview of the data in a single plot
  2. Zoom, select, and move around in the space of the plot
  3. Add information (meta-data), usually through tooltips

Visual analytics

The treemap is one of the first interactive tools moving from research to business (Ben Schneiderman)

Visual analytics

The Gapminder data, made famous by Hans Rosling, provides an opportunity to show an example of several aspects of interactive data visualization

Visual analytics

The most popular application for dynamic data visualization is Tableau.

However, using Python, or R, or other programming languages to create these visualizations allows you to unify the data science pipeline that includes description, analysis and visualization

Interactive/dynamic data visualizations in Python

A caveat

In this module we won't talk about animations, i.e., a sequence of graphics that show a flow of data over time. That is dynamic, but not necessarily interactive.

We will look briefly at animations later in the term

Python packages for interactive visualizations

The main packages for interactive visualizations in Python are

  1. plotly
  2. bokeh
  3. altair

In addition, there are several others, including mpld3 (using d3.js), pygal, and holoviews.

Python packages for interactive graphics

We will explore plotly and altair in this class. In week 6 we'll provide resources for bokeh and using holoviews to create graphics using matplotlib, bokeh and plotly

Both plotly and altair have a coding schema (API) that makes the mappings from the data to the visualization explicit, leading to an easier mental model for creating interactive graphics

Setting up Python

plotly

plotly

plotly.js is a popular Javascript-based interactive visualization library based on d3.js

The company behind plotly.js developed both Python and R interfaces to create interactive graphics using plotly.js

We'll concentrate here on statistical visualizations using plotly, but the documentation will show many other kinds of graphics that can be generated.

plotly

The Python interface to plotly includes two tracks

Often it's easier to start a graphic with plotly.express, and then customize it with elements from plotly.graph_objects.

plotly

We'll start with examples using the panguins data, which we will grab from the seaborn package as a pandas DataFrame

plotly

Let's start with a basic scatterplot

Note that we can mouse over points to get some information, in this case, the x- and y-coordinates

plotly

We'll now add species encoded as color

Note that plotly provides pan, zoom, on/off, select and tooltips automatically

plotly

We can also add marginal plots with additional arguments

plotly

Add regression lines

plotly

Add regression lines

plotly

We can also do trellis graphics pretty easily using plotly

plotly

plotly

We can clean up the plot

Tooltips

Tooltips

You can add data from column(s) of the DataFrame as tooltips quite easily

Tooltips

Choose which variables go into the tooltip

Tooltips

You can change the appearance of the tooltip

Tool tips

You can also provide a template for the tooltips

Univariate plots

Distributional plots

Density plots

We use a trick to create a density plot from a violin plot

Density plots

The plotly.express and plotly.graphical_object paradigms don't currently create density plots. There is another function, figure_factory, that allows you to create density plots. These figure_factory are generally deprecated, but are kept to fill in gaps in other paradigms

Frequency bar plots

Frequency bar plots

Frequency bar plots

Frequency bar plots

Grouped bar charts

Stacked bar charts

Stacked bar charts

Grouped bar chart

Percent bar chart

For the percent bar chart you have to compute the percentages first before creating the bar charts.

plotly

Scatterplot matrices

Continuous vs categorical

Boxplots

Violin plot

Strip plot

Grouped violin plots

Parallel coordinates plot

For parallel coordinate plots, the categorical variable values must be transformed to numeric codes

More dynamism

Sliders

Altair

Altair

Altair provides a wrapper around the Vega-Lite Javascript library, which is based on the famous d3.js.

It provides a syntax that explicitly describes the visual encodings that will be put on a plot.

This syntax is different from plotly, but it is clear in its own way.

Altair

Automatic aggregations

Aggregation

More explicitly,

Univariate plots

Histograms

Density plots

Frequency bar plots

Bivariate plots

Scatter plots

Boxplots

Violin plots

Violin plots, like density plots, are a little trickier, since you have to manually compute the density using the transform_density function

Strip plots

Adding layers

Scatter plots

Scatter plots

Scatter plots + tooltip

Scatter plots + tooltip

See here for details on formatting

Bar plots

Bar plots

Stacked bar charts

Stacked bar charts

Stacked bar charts

Parallel coordinates

Adding layers explicitly

Adding layers explicitly

Adding layers explicitly

Scatter plots + lines

Scatter plots + lines

Facets

Facets

Facets

Facets

Scatterplot matrix

Putting charts together

Side-by-side

In a column

Composite plots

Interactivity

Brushing

Brushing

Multi-line highlight

Resources

  1. Encodings
  2. Geometries
  3. Transformations